iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent
نویسندگان
چکیده
We present iCALL, a speech corpus designed to evaluate Mandarin Chinese pronunciation patterns of non-native speakers of European descent, developed at the Institute for Infocomm Research (IR) in Singapore. To the best of our knowledge, iCALL is larger than any reported non-native corpora to date in terms of utterance number, duration, and number of speakers: iCALL consists of 90,841 utterances from 305 speakers with a total duration of 142 hours. The speakers are gender-balanced, from a diverse native language background, and represent a realistic sampling of the adult age of Mandarin learners. The read utterances are phonetically balanced and are of varying lengths (words, phrases, and sentences). The spoken utterances are phonetically transcribed and perceptually rated with fluency scores by trained native speakers of Mandarin. In this work, we share our experience in corpus design, data collection, and human annotation and analyze phonetic and tonal error patterns, in particular their relationship with speaker demographics and utterance length. Potential applications of the iCALL corpus include computer-assisted pronunciation training (CAPT), lexical tone recognition, automatic fluency assessment, accent recognition, and accented Mandarin speech recognition.
منابع مشابه
Annotation and features of non-native Mandarin tone quality
Native speakers of non-tonal languages, such as American English, frequently have difficulty accurately producing the tones of Mandarin Chinese. This paper describes a corpus of Mandarin Chinese spoken by non-native speakers and annotated for tone quality using a simple good/bad system. We examine interrater correlation of the annotations and highlight the differences in feature distribution be...
متن کاملThe Role of Phoneme in Mandarin Chinese Production: Evidence from ERPs
Established linguistic theoretical frameworks propose that alphabetic language speakers use phonemes as phonological encoding units during speech production whereas Mandarin Chinese speakers use syllables. This framework was challenged by recent neural evidence of facilitation induced by overlapping initial phonemes, raising the possibility that phonemes also contribute to the phonological enco...
متن کاملProduction of English Prominence by Native Mandarin Chinese Speakers
Native-like production of intonational prominence is important for spoken language competency. Non-native speakers may have trouble producing prosodic variation in a second language (L2) and thus, problems in being understood. By identifying common sources of production error, we will be able to aid in the instruction of L2 speakers. In this paper we present results of a production study design...
متن کاملA Corpus-based Study on Figurative Language through the Chinese Five Elements and Body Part Terms
Using a corpus-based approach, this paper analyzes figurative language through observing the Chinese five elements (五 行 ) of 金 ‘metal,’ 木 ‘wood,’ 水 ‘water,’ 火 ‘fire’ and 土 ‘earth.’ This work found that there are at least two types of figurative language in Mandarin Chinese – one of which occurs at the morphosyntactic level and the other occurs during the mappings between two domains (between th...
متن کامل